Skip to content

feat: Kornia GPU augmentation backend for detection training#874

Open
Borda wants to merge 15 commits intodevelopfrom
aug/kornia
Open

feat: Kornia GPU augmentation backend for detection training#874
Borda wants to merge 15 commits intodevelopfrom
aug/kornia

Conversation

@Borda
Copy link
Copy Markdown
Member

@Borda Borda commented Mar 25, 2026

What does this PR do?

  • Add augmentation_backend field to TrainConfig (cpu/auto/gpu); cpu is the default
  • New src/rfdetr/datasets/kornia_transforms.py: registry of 8 transform factories, build_kornia_pipeline, build_normalize, collate_boxes/unpack_boxes box utilities
  • Wire gpu_postprocess flag through coco.py and yolo.py so CPU Albumentations augmentation and normalize are skipped when GPU path is active
  • Add _setup_kornia_pipeline + on_after_batch_transfer to RFDETRDataModule; segmentation models skip GPU aug (phase 2) with a one-time warning
  • Add kornia>=0.7,<1 optional dep group in pyproject.toml
  • 12 new tests across test_module_data.py and test_kornia_transforms.py

Closes #862

Type of Change

  • New feature (non-breaking change that adds functionality)

Testing

  • I have tested this change locally
  • I have added/updated tests for this change

Additional Context

- Add `augmentation_backend` field to `TrainConfig` (cpu/auto/gpu); cpu is the default
- New `src/rfdetr/datasets/kornia_transforms.py`: registry of 8 transform factories, `build_kornia_pipeline`, `build_normalize`, `collate_boxes`/`unpack_boxes` box utilities
- Wire `gpu_postprocess` flag through `coco.py` and `yolo.py` so CPU Albumentations augmentation and normalize are skipped when GPU path is active
- Add `_setup_kornia_pipeline` + `on_after_batch_transfer` to `RFDETRDataModule`; segmentation models skip GPU aug (phase 2) with a one-time warning
- Add `kornia>=0.7,<1` optional dep group in `pyproject.toml`
- 12 new tests across `test_module_data.py` and `test_kornia_transforms.py`

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 25, 2026 23:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in GPU-side augmentation path for detection training by introducing an augmentation_backend switch and routing normalization/augmentation to run after the batch is transferred to device (via RFDETRDataModule.on_after_batch_transfer), while keeping the existing CPU Albumentations pipeline as the default.

Changes:

  • Add TrainConfig.augmentation_backend ("cpu" | "auto" | "gpu") and a new optional dependency group kornia.
  • Thread a gpu_postprocess flag through COCO/YOLO dataset builders so CPU Albumentations + Normalize can be skipped when GPU postprocessing is active.
  • Add DataModule logic + tests for backend resolution and the on_after_batch_transfer hook.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/rfdetr/training/module_data.py Adds Kornia pipeline setup and on_after_batch_transfer GPU postprocessing hook.
src/rfdetr/datasets/coco.py Adds gpu_postprocess option to training transforms and wires backend flag through dataset builders.
src/rfdetr/datasets/yolo.py Wires backend flag through Roboflow-from-YOLO builder.
src/rfdetr/config.py Introduces augmentation_backend on TrainConfig.
src/rfdetr/datasets/aug_config.py Documents Kornia GPU backend and Phase 1 limitations.
pyproject.toml Adds optional dependency group kornia.
tests/training/test_module_data.py Adds tests for backend resolution and on_after_batch_transfer.
tests/training/conftest.py Adds autouse fixture to restore RFDETRDataModule.trainer property after tests.
CHANGELOG.md Documents the new augmentation_backend feature.
Comments suppressed due to low confidence (1)

src/rfdetr/datasets/coco.py:356

  • The make_coco_transforms() docstring and Args list no longer match behavior now that gpu_postprocess can skip Albumentations and Normalize() for the train split. Please document the new gpu_postprocess parameter and clarify that normalization is deferred to the DataModule GPU path when it’s enabled.
    """Build the standard COCO transform pipeline for a given dataset split.

    Returns a composed transform that resizes images to the target ``resolution``
    (with optional multi-scale jitter), applies Albumentations-based augmentations
    during training, and normalises pixel values with ImageNet statistics.

    For the ``"train"`` split the pipeline uses a two-branch ``OneOf`` between a
    direct resize and a resize → random-crop → resize sequence (built via
    :func:`_build_train_resize_config`), followed by the augmentation stack and
    normalisation.  For ``"val"``, ``"test"``, and ``"val_speed"`` only resize and
    normalisation are applied — no augmentation.

Borda and others added 4 commits March 26, 2026 00:55
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…rmalize

- Import get_logger and add module-level logger to o365.py
- Detect augmentation_backend from args; emit WARNING when non-cpu (Phase 1 limitation: no aug_config support for O365)
- Compute gpu_postprocess flag and pass to both make_coco_transforms / make_coco_transforms_square_div_64 calls

Addresses review comment — HIGH blocking: double normalize for O365 users with augmentation_backend != 'cpu' (PR #874)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
…RNING

- Add _kornia_setup_done: bool = False in __init__ to prevent _setup_kornia_pipeline re-running on every setup('fit') call when the auto+no-CUDA/no-kornia fallback leaves _kornia_pipeline as None
- Switch the auto+no-CUDA fallback from logger.info to logger.warning (consistent with auto+no-kornia WARNING)

Addresses review comments — MEDIUM: setup guard re-runs in fallback path; inconsistent log levels (PR #874)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- _make_gaussian_blur: enforce blur_limit >= 3 after odd-rounding (Kornia requires kernel_size >= 3)
- make_coco_transforms, make_coco_transforms_square_div_64: add gpu_postprocess to Args docstring
- unpack_boxes: correct docstring claiming in-place mutation (function returns shallow copies)
- conftest.py: fix docstring wording LightningModule → LightningDataModule

Addresses review comments from @Copilot and @review on PR #874

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda Borda self-assigned this Mar 26, 2026
Borda and others added 2 commits March 26, 2026 01:06
…pipeline forward pass

- TestGaussianBlurMinKernel: parametrized test for blur_limit=1,2 producing valid kernel_size >= 3
- TestKorniaPipelineForwardPass: shape/dtype check and empty-bbox batch through built pipeline (kornia skip guard)
- TestBuildO365RawGpuBackend: warning emitted for non-cpu backend; gpu_postprocess wired correctly; square-resize delegate
- TestKorniaSetupDoneSentinel: sentinel starts False, set after fit, _setup_kornia_pipeline called exactly once across repeated setup('fit') calls

Closes review test-coverage gaps from PR #874

---
Co-authored-by: Claude Code <noreply@anthropic.com>
When augmentation_backend != 'cpu' and aug_config is not explicitly set,
build_kornia_pipeline was receiving {} (empty dict) while the CPU path
correctly fell back to AUG_CONFIG.  This caused GPU training to have zero
augmentation by default — a silent regression.

- Import AUG_CONFIG in module_data.py
- Use `train_config.aug_config if ... is not None else AUG_CONFIG` in _setup_kornia_pipeline
- Add test_gpu_path_uses_aug_config_fallback to TestBackendResolution

Addresses QA finding B1 (blocking) from post-commit review of PR #874

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 72.92576% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 79%. Comparing base (6fcfe05) to head (26c7c80).

❌ Your patch check has failed because the patch coverage (73%) is below the target coverage (95%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (79%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #874    +/-   ##
=======================================
- Coverage       79%    79%    -0%     
=======================================
  Files           97     98     +1     
  Lines         7819   8044   +225     
=======================================
+ Hits          6169   6340   +171     
- Misses        1650   1704    +54     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Borda and others added 5 commits March 26, 2026 22:06
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…auto backend

- _make_gaussian_blur: kernel_size=(blur_limit, blur_limit) instead of (3, blur_limit) — square kernel per Albumentations semantics (Copilot #2991808605)
- build_normalize: pass plain Python tuples instead of torch.tensor() so Kornia handles device placement (Copilot #2991808573)
- on_after_batch_transfer: call .to(img.device) on pipeline and normalize before use to prevent CPU/GPU device mismatch (Copilot #2991808540)
- setup("fit"): resolve 'auto' backend via _resolve_augmentation_backend() before dataset build so gpu_postprocess matches actual runtime behavior — fixes silent CPU-normalize stripping on machines without CUDA/kornia (Copilot #2991808618, #2991808669)
- 4 new tests covering _resolve_augmentation_backend and namespace pre-resolution

---
Co-authored-by: Claude Code <noreply@anthropic.com>
The try block unconditionally set has_kornia=True with no import, making
the except unreachable; on machines with CUDA but without kornia, auto
would incorrectly resolve to gpu — causing ImportError or unnormalized
training inputs (review HIGH finding).

Also update test_auto_backend_emits_warning and
test_gpu_postprocess_true_for_auto_backend to mock CUDA+kornia availability
so the GPU path is actually exercised; add complementary no-CUDA tests.

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

boxes_padded, valid = collate_boxes(targets, img.device)
img_aug, boxes_aug = self._kornia_pipeline(img, boxes_padded)
img_aug = self._kornia_normalize(img_aug)
targets = unpack_boxes(boxes_aug, valid, targets, *img_aug.shape[-2:])
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU postprocess path currently normalizes images with Kornia but never converts targets[i]["boxes"] from absolute xyxy pixels into normalized cxcywh. In the CPU pipeline this conversion happens inside rfdetr.datasets.transforms.Normalize, and the matcher/criterion expects normalized cxcywh targets. After unpack_boxes(...), convert boxes to cxcywh and divide by [W, H, W, H] (and keep this in sync with any box filtering) so training receives the same target format as the CPU backend.

Suggested change
targets = unpack_boxes(boxes_aug, valid, targets, *img_aug.shape[-2:])
targets = unpack_boxes(boxes_aug, valid, targets, *img_aug.shape[-2:])
height, width = img_aug.shape[-2:]
for target in targets:
boxes = target["boxes"]
if boxes.numel() == 0:
continue
cx = (boxes[:, 0] + boxes[:, 2]) / 2
cy = (boxes[:, 1] + boxes[:, 3]) / 2
w = boxes[:, 2] - boxes[:, 0]
h = boxes[:, 3] - boxes[:, 1]
boxes_cxcywh = torch.stack((cx, cy, w, h), dim=-1)
scale = torch.tensor([width, height, width, height], dtype=boxes.dtype, device=boxes.device)
target["boxes"] = boxes_cxcywh / scale

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +128
if backend != "auto":
return backend
if not torch.cuda.is_available():
return "cpu"
try:
import kornia.augmentation # noqa: F401

return "gpu"
except ImportError:
return "cpu"
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_resolve_augmentation_backend() calls torch.cuda.is_available(), which this repo explicitly tries to avoid in fork-based DDP/notebook strategies because it can initialize a CUDA driver context. Consider using the existing fork-safe device detection (rfdetr.config.DEVICE / torch.accelerator.current_accelerator logic) here (and in _setup_kornia_pipeline) to prevent regressions in ddp_notebook / fork workflows.

Copilot uses AI. Check for mistakes.
Comment on lines 666 to 688
expanded_scales = getattr(args, "expanded_scales", False)
do_random_resize_via_padding = getattr(args, "do_random_resize_via_padding", False)
patch_size = getattr(args, "patch_size", 16)
num_windows = getattr(args, "num_windows", 4)
aug_config = getattr(args, "aug_config", None)
gpu_postprocess = getattr(args, "augmentation_backend", "cpu") != "cpu" and not include_masks

if square_resize_div_64:
logger.info(f"Building Roboflow {image_set} dataset with square resize at resolution {resolution}")
dataset = CocoDetection(
img_folder,
ann_file,
transforms=make_coco_transforms_square_div_64(
image_set,
resolution,
multi_scale=multi_scale,
expanded_scales=expanded_scales,
skip_random_resize=not do_random_resize_via_padding,
patch_size=patch_size,
num_windows=num_windows,
aug_config=aug_config,
gpu_postprocess=gpu_postprocess,
),
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu_postprocess is derived from augmentation_backend != "cpu", which treats augmentation_backend="auto" as GPU even on machines without CUDA/kornia. If build_roboflow_from_coco() is invoked without the DataModule’s prior resolution step, this will incorrectly strip CPU Normalize/augmentation and yield unnormalized training inputs. Resolve "auto" to "cpu"|"gpu" (same logic as _resolve_augmentation_backend) before computing gpu_postprocess.

Copilot uses AI. Check for mistakes.
Comment on lines 706 to 746
expanded_scales = getattr(args, "expanded_scales", None)
do_random_resize_via_padding = getattr(args, "do_random_resize_via_padding", False)
patch_size = getattr(args, "patch_size", None)
num_windows = getattr(args, "num_windows", None)
aug_config = getattr(args, "aug_config", None)
gpu_postprocess = getattr(args, "augmentation_backend", "cpu") != "cpu" and not include_masks

if square_resize_div_64:
dataset = YoloDetection(
img_folder=str(img_folder),
lb_folder=str(lb_folder),
data_file=str(data_file),
transforms=make_coco_transforms_square_div_64(
image_set,
resolution,
multi_scale=multi_scale,
expanded_scales=expanded_scales,
skip_random_resize=not do_random_resize_via_padding,
patch_size=patch_size,
num_windows=num_windows,
aug_config=aug_config,
gpu_postprocess=gpu_postprocess,
),
include_masks=include_masks,
)
else:
dataset = YoloDetection(
img_folder=str(img_folder),
lb_folder=str(lb_folder),
data_file=str(data_file),
transforms=make_coco_transforms(
image_set,
resolution,
multi_scale=multi_scale,
expanded_scales=expanded_scales,
skip_random_resize=not do_random_resize_via_padding,
patch_size=patch_size,
num_windows=num_windows,
aug_config=aug_config,
gpu_postprocess=gpu_postprocess,
),
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as COCO Roboflow builder: gpu_postprocess = augmentation_backend != "cpu" treats augmentation_backend="auto" as GPU even when CUDA/kornia are unavailable, which can strip CPU Normalize/augmentation unexpectedly if this builder is called outside the DataModule. Resolve "auto" to an actual backend before deciding gpu_postprocess.

Copilot uses AI. Check for mistakes.
Comment on lines 33 to +66
square_resize_div_64 = getattr(args, "square_resize_div_64", False)
augmentation_backend = getattr(args, "augmentation_backend", "cpu")
resolved_backend = augmentation_backend

if augmentation_backend == "auto":
# Resolve 'auto' based on CUDA and kornia availability
has_cuda = False
has_kornia = False
try:
import torch

has_cuda = bool(torch.cuda.is_available())
except Exception:
has_cuda = False

try:
import kornia.augmentation # noqa: F401

has_kornia = True
except Exception:
has_kornia = False

if has_cuda and has_kornia:
resolved_backend = "gpu"
else:
resolved_backend = "cpu"

if resolved_backend != "cpu":
logger.warning(
"O365 dataset does not support custom aug_config in Phase 1 GPU augmentation; "
"Albumentations augmentation is skipped and normalization runs on GPU. "
"Pass augmentation_backend='cpu' for full CPU augmentation pipeline with O365."
)
gpu_postprocess = resolved_backend != "cpu"
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_o365_raw() resolves only the "auto" backend; augmentation_backend="gpu" is not validated (no CUDA/kornia checks) and will still set gpu_postprocess=True, potentially producing unnormalized inputs if this dataset builder is used outside the Lightning DataModule. Align O365 behavior with the documented backend contract: for "gpu" raise when CUDA is unavailable and raise an ImportError with an install hint when kornia is missing. Also consider narrowing the broad except Exception blocks here to ImportError/RuntimeError so real errors aren’t silently swallowed.

Copilot uses AI. Check for mistakes.
Comment on lines +996 to +1017
def test_training_true_applies_augmentation(self, tmp_path):
"""When training=True and _kornia_pipeline is set, augmentation is applied."""
dm = self._build_dm(tmp_path)
dm = self._attach_mock_trainer(dm, training=True)

samples, targets = self._make_kornia_batch()
img_aug = samples.tensors.clone()
# Mock pipeline returns (augmented_images, augmented_boxes)
boxes_padded = torch.tensor([[[2.0, 2.0, 10.0, 10.0]]] * 2)
mock_pipeline = MagicMock(return_value=(img_aug, boxes_padded))
dm._kornia_pipeline = mock_pipeline

# Mock normalize to be a passthrough
dm._kornia_normalize = MagicMock(side_effect=lambda x: x)

result = dm.on_after_batch_transfer((samples, targets), dataloader_idx=0)

mock_pipeline.assert_called_once()
dm._kornia_normalize.assert_called_once()
assert isinstance(result, tuple)
assert len(result) == 2

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new on_after_batch_transfer tests only assert that the pipeline/normalize mocks are called, but they don’t validate the most important contract: output images are normalized and target boxes are in the same format as the CPU pipeline (normalized cxcywh). Add assertions on result_targets[...]["boxes"] (shape/range/format) to catch regressions in the GPU postprocess path.

Copilot generated this review using guidance from repository custom instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPU augmentation routing via Kornia in PTL DataModule

2 participants